Chapter 3 Exploratory analysis
Here you can check data fields and fields descriptions for all variables appearing in the dataset
Variables related with the trade process:
- control_number: It represents a unique individual shipment processed by the USFWS.
- quantity: It represents the numeric quantity of the wildlife produc
- unit: It represents the unit for the numeric quantity
- import_export: It represents whether the shipment is an (I)mport or (E)xport
- action: Action taken by USFWS on import ((C)leared/(R)efused)
- shipment_date: Full date when shipment arrived
- shipment_year: Year when the shipment arrived (derived from “shiptment_year”)
- disposition: Fate of the import
- disposition_date: Full date when disposition occurred
- disposition_year: Year when disposition occurred (derived from “disposition_date”)
Variables related with the countries:
- country_origin: It represents the code for the country of origin of the wildlife product
- country_imp_exp: It represents the code for the country to/from which the wildlife product is shipped
- port: It represents the port or region of shipment entry
- us_co: It represents the US party of the shipment
- foreign_co: It represents the foreign party of the shipment
Variables related with the product:
- description: It represents the type/form of the wildlife product
- value: It represents the reported value of the wildlife product in US dollars
- purpose: It represents the reason the wildlife product is being imported
- source: It represents the type of source within the origin country (e.g., wild, bred)
- species_code: It represents the USFWS code for the wildlife product
- taxa: It represents the USFWS-derived broad taxonomic categorization
- class: It represents the EHA-derived class-level taxonomic designation
- genus: It represents the Genus (or higher-level taxonomic name) of the wildlife product
- species: It represents species of the wildlife product
- subspecies: It represents subspecies of the wildlife product
- specific_name: It represents a specific common name for the wildlife product
- generic_name: It represents a general common name for the wildlife product
3.1 Variables related with the trade process:
3.1.1 control_number
It represents a unique individual shipment processed by the USFWS. A shipment refers to any container or group of containers that share a common control number, country of shipment and shipment date. Each shipment may be represented in the lemis dataset by multiple segments or rows, because the contents are derived from more than one species or type of product.
- There are 2,079,637 unique shipments containing 5,451,832 segments.
- Of those, 60% (1,265,491 unique shipments) represent single segments. It means that for 1,265,491 shipments only one species and one type of product were discovered. The remaining 40% (814,146 unique shipments) contain 4,186,341 segments.
- These multiple segments shipments contain from 2 to 492 segments. The average number of segments in this kind of shipments is 5; the median is 3.
# Dropping unused levels
data$control_number <- droplevels(data$control_number)
# Unique shipments
shipments<- data %>% dplyr::group_by(control_number) %>%
dplyr::summarise(size_segments=n()) %>% ungroup() %>%
mutate(type_segment = ifelse(size_segments==1, "single", "multiple")) %>%
mutate(type_segment=as.factor(type_segment))
shipments<- shipments %>% select(size_segments, type_segment) %>%
mutate(size_segments=as.factor(size_segments)) %>%
dplyr::group_by(size_segments) %>%
dplyr::summarise(total_shipments=n())
DT::datatable(shipments, rownames = FALSE,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 1: ',
htmltools::em('Number of unique shipments based on the amount of
segments'))) %>%
formatRound('total_shipments',1) 3.1.2 quantity and unit
Quantity represents the numeric quantity of the wildlife product, while unit represents the unit of measure for the numeric quantity.
- There are 13 types of units. 94.64% of data is measured with the unit “Number” (the number of individual wildlife items).
- The shipments include around 11,5 billions of wildlife items plus 1,1 billion kg of wildlife items only measured in weight.
units <-data %>% group_by(unit) %>%
summarise(total_segments = n(),percentage=n()/nrow(data)) %>%
drop_na(unit) %>%
arrange(desc(percentage))
DT::datatable(units,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 1: ',
htmltools::em('Units of measure'
))) %>%
formatRound('total_segments',1) %>%
formatPercentage('percentage',2)data$quantity<- as.numeric(data$quantity)
quantity<- data %>%
dplyr::group_by(unit) %>%
drop_na(quantity) %>%
dplyr::summarise(quantity = sum(quantity, na.rm=TRUE)) %>%
arrange(desc(quantity))
DT::datatable(quantity,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 1: ',
htmltools::em('Quantity per unit'))) %>%
formatRound('quantity',1) 3.1.3 import_export
It represents Whether the shipment is an (I)mport or (E)xport. In this dataset, 100% of the data is an import.
imports <-data %>% group_by(import_export) %>%
summarise(total = n(), percentage=n()/nrow(data)) %>%
arrange(desc(percentage))
DT::datatable(imports,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 2: ',
htmltools::em('Shipments: Imports and exports'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)imports %>% mutate(percentage=percentage*100) %>%
plot_ly(x=~reorder(import_export, desc(percentage)), y=~percentage, color=~import_export) %>%
add_bars() %>%
layout(title = "<b>Shipments: Imports and exports</b>",
xaxis= list(title= "<b>Imports and exports</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))3.1.4 action
Action taken by the USFWS on import ((C)leared/(R)efused)
- 98.28% of imports were legally declared, just 1.73% was refused. We can find cleared and refused segments in the same unique shipment.
- Most of the refused shipments included items related with mammals (28%) and reptils (27%)
- Most illegal shipments are exported from Mexico (19%), Canada (9%) and China (8%).
- The country of origin of the refused shipments is mainly unknown (23%), followed by Mexico (14%), Canada (6%), Indonesia (6% )and China (6%)
action <-data %>% group_by(action) %>%
summarise(total= n(), percentage=n()/nrow(data)) %>%
drop_na(action) %>%
arrange(desc(percentage))
DT::datatable(action,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 3: ',
htmltools::em('Action taken by the USFWS on imports'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)action %>% mutate(percentage=percentage*100) %>%
plot_ly(x=~reorder(action, desc(percentage)), y=~percentage, color=~action) %>%
add_bars() %>%
layout(title = "<b>Action taken by the USFWS on imports</b>",
xaxis= list(title= "<b>Actions</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))refused <- data %>% filter(action=="Refused")
# Country of origin
refused %>% group_by(country_origin) %>%
summarise(total=n(), percentage=n()/nrow(refused)*100) %>%
arrange(desc(percentage)) %>%
top_n(10, percentage) %>%
plot_ly(x=~reorder(country_origin, desc(percentage)), y=~percentage,
color=~country_origin) %>%
add_bars() %>%
layout(title = "<b>Country of origin of refused shipments</b>",
xaxis= list(title= "<b>Country of origin</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))# Country of export
refused %>% group_by(country_imp_exp) %>%
summarise(total=n(), percentage=n()/nrow(refused)*100) %>%
arrange(desc(percentage)) %>%
top_n(10, percentage) %>%
plot_ly(x=~reorder(country_imp_exp, desc(percentage)), y=~percentage,
color=~country_imp_exp) %>%
add_bars() %>%
layout(title = "<b>Country of export of refused shipments</b>",
xaxis= list(title= "<b>Country of export</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))# Taxa
refused %>% group_by(taxa) %>%
summarise(total=n(), percentage=n()/nrow(refused)*100) %>%
arrange(desc(percentage)) %>%
top_n(10, percentage) %>%
plot_ly(x=~reorder(taxa, desc(percentage)), y=~percentage,
color=~taxa) %>%
add_bars() %>%
layout(title = "<b>Taxa of refused shipments</b>",
xaxis= list(title= "<b>Taxa</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))3.1.5 disposition
It represents the fate of the import
- There are 5 categories: C, S, A, R and non-standard value
- The C category represents 98.3% of data
disposition <-data %>% group_by(disposition) %>%
summarise(total = n(), percentage=n()/nrow(data)) %>%
drop_na(disposition) %>%
arrange(desc(percentage))
DT::datatable(disposition,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 4: ',
htmltools::em('Fate of the import'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)disposition %>% mutate(percentage=percentage*100) %>%
plot_ly(x=~reorder(disposition, desc(percentage)), y=~percentage, color=~disposition) %>%
add_bars() %>%
layout(title = "<b>Fate of the import</b>",
xaxis= list(title= "<b>Dispositions</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))3.1.6 shipment_date and disposition_date
Shipment_date represents the date when shipment arrived, while disposition_date represents the date when disposition occurred.
54% of dispositions took place within a month of the shipment date (most of them within a week)
- While ‘shipment_date’ entries fell completely within the time period of 2000–2014, ‘disposition_date’ ranged more widely
- Users should be wary of any disposition date values that precede the associated shipment date, as we are unaware of how this could represent an accurate accounting of the product disposition process. However, for many potential analyses, differences in the date fields may not be a significant cause for concern because ‘shipment_date’ alone provides a sound index for those interested in temporal trends in wildlife trade
days<- data %>%
mutate(days = as.factor(as.numeric(disposition_date - shipment_date))) %>%
group_by(days) %>%
summarise(total = n(), percentage=n()/nrow(data)) %>%
filter(days %in% c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
"11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
"21", "22", "23", "24", "25", "26", "27", "28", "29", "30"))
DT::datatable(days,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 5: ',
htmltools::em('Number of dispositions within a month of the shipment date'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)3.2 Variables related with the countries
3.2.1 country_origin
It represents the code for the country of origin of the wildlife product
- There are 252 countries of origin
- The top 15 represents 74.5% of data
- The top 50 represents 94.4% of data
# Dropping unused levels
data$country_origin <- droplevels(data$country_origin)
country_origin <-data %>% group_by(country_origin) %>%
summarise(total = n(), percentage=n()/nrow(data)) %>%
drop_na(country_origin) %>%
arrange(desc(percentage))
DT::datatable(country_origin,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 6: ',
htmltools::em('Country of origin of the wildlife product'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)country_origin %>% mutate(percentage=percentage*100) %>%
top_n(15, percentage) %>%
plot_ly(x=~reorder(country_origin, desc(percentage)), y=~percentage,
color=~country_origin) %>%
add_bars() %>%
layout(title = "<b>Country of origin of the wildlife product (Top 15)</b>",
xaxis= list(title= "<b>Country of origin</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))country_origin_illegal <-data %>%
filter(action=="Refused")
country_origin_illegal <- country_origin_illegal %>%
group_by(country_origin) %>%
summarise(total = n(), percentage=n()/nrow(country_origin_illegal)) %>%
drop_na(country_origin) %>%
arrange(desc(percentage))
DT::datatable(country_origin_illegal,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 8: ',
htmltools::em('Country of orogin (illegal imports)'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)country_origin_illegal %>% mutate(percentage=percentage*100) %>%
top_n(15, percentage) %>%
plot_ly(x=~reorder(country_origin, desc(percentage)), y=~percentage,
color=~country_origin) %>%
add_bars() %>%
layout(title = "<b>Country of origin (illegal imports) (Top 15)</b>",
xaxis= list(title= "<b>Country of origin</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))3.2.2 country_imp_exp
It represents the code for the country to/from which the wildlife product is shipped
- There are 257 countries to/from which the product is shipped
- The top 15 represents 75.9% of data
- The top 50 represents 95.9% of data
# Dropping unused levels
data$country_imp_exp <- droplevels(data$country_imp_exp)
country_imp_exp <-data %>% group_by(country_imp_exp) %>%
summarise(total = n(), percentage=n()/nrow(data)) %>%
drop_na(country_imp_exp) %>%
arrange(desc(percentage))
DT::datatable(country_imp_exp,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 7: ',
htmltools::em('Country to/from which the wildlife product is shipped'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)country_imp_exp %>% mutate(percentage=percentage*100) %>%
top_n(15, percentage) %>%
plot_ly(x=~reorder(country_imp_exp, desc(percentage)), y=~percentage,
color=~country_imp_exp) %>%
add_bars() %>%
layout(title = "<b>Country to/from which the wildlife product is shipped (Top 15)</b>",
xaxis= list(title= "<b>country_imp_exp</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))country_imp_exp_illegal <-data %>%
filter(action=="Refused")
country_imp_exp_illegal <- country_imp_exp_illegal %>%
group_by(country_imp_exp) %>%
summarise(total = n(), percentage=n()/nrow(country_imp_exp_illegal)) %>%
drop_na(country_imp_exp) %>%
arrange(desc(percentage))
DT::datatable(country_imp_exp_illegal,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 8: ',
htmltools::em('Country of export (illegal imports)'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)country_imp_exp_illegal %>% mutate(percentage=percentage*100) %>%
top_n(15, percentage) %>%
plot_ly(x=~reorder(country_imp_exp, desc(percentage)), y=~percentage,
color=~country_imp_exp) %>%
add_bars() %>%
layout(title = "<b>Country of export (illegal imports) (Top 15)</b>",
xaxis= list(title= "<b>Country of export</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))Most active countries are on both sides (country of origin & country of export)
country_origin <- country_origin %>%
mutate(percentage=percentage*100) %>% top_n(4, percentage) %>%
rename(country = country_origin)
country_imp_exp <- country_imp_exp %>%
mutate(percentage=percentage*100) %>% top_n(4, percentage) %>%
rename(country = country_imp_exp)
countries<- combine(country_origin, country_imp_exp)
countries %>%
plot_ly(x=~reorder(country, desc(percentage)), y=~percentage,
color=~source) %>%
add_bars() %>%
layout(title = "<b>Most active countries</b>",
xaxis= list(title= "<b>Country</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))3.2.3 port
It represents the port of entry Although there are around 328 ports of entry into the EEUU, only 73 of them are represented in this dataset. This is because not all of them are covered by FWS wildlife inspectors. By law, only 18 of them are designated ports with full-time inspectors.
Based on the number of declared imports, the top-five ports of entry were Los Angeles (20%), New York(19%), Miami(6%), Newark(5%) and San Francisco(5%). The top 15 represents 84% of data.
Based on the number of illegal imports, the top-five ports of entry were
# Dropping unused levels
data$port <- droplevels(data$port)
port <-data %>% group_by(port) %>%
summarise(total = n(), percentage=n()/nrow(data)) %>%
drop_na(port) %>%
arrange(desc(percentage))
DT::datatable(port,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 8: ',
htmltools::em('Port or region of shipment entry'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)port %>% mutate(percentage=percentage*100) %>%
top_n(15, percentage) %>%
plot_ly(x=~reorder(port, desc(percentage)), y=~percentage,
color=~port) %>%
add_bars() %>%
layout(title = "<b>Port of entry (Top 15)</b>",
xaxis= list(title= "<b>Port</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))port_illegal <-data %>%
filter(action=="Refused")
port_illegal <- port_illegal %>%
group_by(port) %>%
summarise(total = n(), percentage=n()/nrow(port_illegal)) %>%
drop_na(port) %>%
arrange(desc(percentage))
DT::datatable(port_illegal,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 8: ',
htmltools::em('Port of entry (illegal imports)'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)port_illegal %>% mutate(percentage=percentage*100) %>%
top_n(15, percentage) %>%
plot_ly(x=~reorder(port, desc(percentage)), y=~percentage,
color=~port) %>%
add_bars() %>%
layout(title = "<b>Port of entry (illegal imports) (Top 15)</b>",
xaxis= list(title= "<b>Port</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))3.2.4 Trade routes
trade_routes <-data %>% group_by(country_imp_exp, port) %>%
summarise(total = n(), percentage=n()/nrow(data)) %>%
drop_na(port, country_imp_exp) %>%
arrange(desc(percentage))
DT::datatable(trade_routes,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 8: ',
htmltools::em('Trade routes'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)3.2.5 us_co
It represents the US party of the shipment
- We have excluded the “EXEMPTIONS 6 AND 7(C)” from the analysis.
- There are 126,052 US parties
- The top 15 just represents 10.3% of data
- The top 50 just represents 19 % of data
us_co <-data %>%
filter(!us_co == "EXEMPTIONS 6 AND 7(C)") %>%
group_by(us_co) %>%
summarise(total = n(), percentage=n()/nrow(data)) %>%
drop_na(us_co) %>%
arrange(desc(percentage))
DT::datatable(us_co,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 9: ',
htmltools::em('US party of the shipment'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)us_co %>% mutate(percentage=percentage*100) %>%
top_n(20, percentage) %>%
plot_ly(x=~reorder(us_co, desc(percentage)), y=~percentage,
color=~us_co) %>%
add_bars() %>%
layout(title = "<b>US party of the shipment (Top 20)</b>",
xaxis= list(title= "<b>US party</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))We want to analyze which American corporations are importing the most. So, we have grouped the mayor corporations based on the company names.
We’ve grouped most of those companies that represent at least 19% of the data, covering approximately 1.000.000 observations.
data$corporation<- # Fashion/Luxury/Design products
ifelse(grepl("prada", data$us_co, ignore.case = TRUE), "Prada",
ifelse(grepl("ralph lauren", data$us_co, ignore.case = TRUE),
"Ralph Lauren",
ifelse(grepl("LOUIS VUITTON", data$us_co, ignore.case = TRUE),
"Louis Vuitton",
ifelse(grepl("MONCLER", data$us_co, ignore.case = TRUE),
"Moncler",
ifelse(grepl("BOTTEGA VENETA", data$us_co, ignore.case = TRUE),
"Bottega Veneta",
ifelse(grepl("RICHEMONT", data$us_co, ignore.case = TRUE),
"Richemont",
ifelse(grepl("FENDI", data$us_co, ignore.case = TRUE),
"Fendi",
ifelse(grepl("HERMES", data$us_co, ignore.case = TRUE),
"Hermès",
ifelse(grepl("GUCCI", data$us_co, ignore.case = TRUE),
"Gucci",
ifelse(grepl("Beeline", data$us_co, ignore.case = TRUE),
"Beeline Group",
ifelse(grepl("fossil partner", data$us_co, ignore.case = TRUE),
"Fossil Partners, L.P.",
ifelse(grepl("dfs", data$us_co, ignore.case = TRUE),
"DFS Group",
ifelse(grepl("gluck", data$us_co, ignore.case = TRUE),
"E. Gluck Corporation",
ifelse(grepl("ferragamo", data$us_co, ignore.case = TRUE),
"Salvatore Ferragamo",
ifelse(grepl("jacadi", data$us_co, ignore.case = TRUE),
"Jacadi",
ifelse(grepl("bomac", data$us_co, ignore.case = TRUE),
"Bomac International Corp",
ifelse(data$us_co %in% c("PIER I IMPORTS, INC.",
"PIER 1 IMPORTS, INC. "),
"Pier 1",
# Museums
ifelse(grepl("museum", data$us_co, ignore.case = TRUE),
"Museums",
ifelse(grepl("smithsonian", data$us_co, ignore.case = TRUE),
"Museums",
# Animals or animal products providers
ifelse(grepl("SEA DWELLING", data$us_co, ignore.case = TRUE),
"Sea Dwelling creatures",
ifelse(grepl("HIPPOCAMPE", data$us_co, ignore.case = TRUE),
"Hippocampe USA",
ifelse(grepl("AQUA-NAUTIC", data$us_co, ignore.case = TRUE),
"Aqua Nautic Specialist",
ifelse(grepl("UNDERWATER WORLD", data$us_co, ignore.case = TRUE),
"Underwater World",
ifelse(grepl("GOLDEN INA", data$us_co, ignore.case = TRUE),
"Golden Ina",
ifelse(grepl("QUALITY MARINE", data$us_co, ignore.case = TRUE),
"Quality Marine",
ifelse(grepl("Arsian", data$us_co, ignore.case = TRUE),
"Arsian Imports",
ifelse(grepl("aquarium arts", data$us_co, ignore.case = TRUE),
"Aquarium Arts",
ifelse(grepl("WALT SMITH", data$us_co, ignore.case = TRUE),
"Walt Smith International",
ifelse(grepl("all seas fisheries", data$us_co, ignore.case = TRUE),
"Allseas Fisheries",
ifelse(grepl("sun pet ltd", data$us_co, ignore.case = TRUE),
"Sun Pet LTD",
ifelse(grepl("pacific aqua farms", data$us_co, ignore.case = TRUE),
"Pacific Aquafarms",
ifelse(grepl("INTINENTAL", data$us_co, ignore.case = TRUE),
"Intinental Pri",
ifelse(grepl("AQUACO", data$us_co, ignore.case = TRUE),
"Aquaco",
ifelse(grepl("all seas marine", data$us_co, ignore.case = TRUE),
"Allseas Marine",
ifelse(grepl("transship discounts", data$us_co, ignore.case = TRUE),
"Transship Discounts LTD",
ifelse(grepl("SEGREST FARMS", data$us_co, ignore.case = TRUE),
"Segrest Farms",
ifelse(grepl("fish head", data$us_co, ignore.case = TRUE),
"Fish Heads Inc",
ifelse(grepl("holiday coral", data$us_co, ignore.case = TRUE),
"Holiday Coral Inc",
ifelse(grepl("pacific island imp", data$us_co, ignore.case = TRUE),
"Pacific Island Imports",
ifelse(grepl("PAN OCEAN AQUARIUM", data$us_co, ignore.case = TRUE),
"Pan Ocean Aquarium, Inc",
ifelse(grepl("SALTWATER INC.", data$us_co, ignore.case = TRUE),
"Saltwater Inc",
ifelse(grepl("saltwaterfish", data$us_co, ignore.case = TRUE),
"Saltwaterfish",
ifelse(grepl("golden sea int", data$us_co, ignore.case = TRUE),
"Golden Sea Inc",
ifelse(grepl("strictly reptiles", data$us_co, ignore.case = TRUE),
"Strictly Reptiles Inc",
ifelse(grepl("emark tropical", data$us_co, ignore.case = TRUE),
"Emark Tropical Imports, Inc",
ifelse(data$us_co %in% c("DOLPHIN INTERNATIONAL", "DOLPHIN INT'L",
"DOLPHIN INTERNAITONAL"),
"Dolphin International",
ifelse(data$us_co %in% c("a & m aquatics", "A&M AQUATICS"),
"A&M Aquatics",
ifelse(data$us_co %in% c("LPS LLC", "LPS, LLC", "LPS"),
"LPS LLC",
ifelse(data$us_co %in% c("APET, INC", "APET INC"),
"Apet Inc",
"Other")))))))))))))))))))))))))))))))))))))))))))))))))
data_corporations <- data %>%
mutate(corporation = as.factor(corporation))
# Let's classify these corporations
corp_fashion<- c("Beeline Group", "Bomac International Corp", "Bottega Veneta",
"DFS Group","E. Gluck Corporation", "Fendi",
"Fossil Partners, L.P.", "Gucci", "Hermès", "Jacadi",
"Louis Vuitton","Moncler", "Pier 1","Prada", "Ralph Lauren",
"Richemont", "Salvatore Ferragamo")
corp_animalproviders<-c("A&M Aquatics", "Aqua Nautic Specialist", "Aquaco",
"Aquarium Arts", "Allseas Fisheries", "Allseas Marine",
"Apet Inc", "Arsian Imports", "Dolphin International",
"Emark Tropical Imports, Inc", "Fish Heads Inc",
"Golden Ina", "Golden Sea Inc", "Hippocampe USA",
"Holiday Coral Inc", "Intinental Pri", "LPS LLC",
"Pacific Aquafarms", "Pacific Island Imports",
"Pan Ocean Aquarium, Inc", "Quality Marine",
"Saltwater Inc", "Saltwaterfish", "Sea Dwelling creatures",
"Segrest Farms", "Strictly Reptiles Inc", "Sun Pet LTD",
"Transship Discounts LTD", "Underwater World",
"Walt Smith International")
data$corp_classif<- ifelse(data$corporation %in% corp_fashion, "Fashion/Luxury/Design",
ifelse(data$corporation %in% corp_animalproviders,
"Animal/animal prod. providers",
ifelse(data$corporation %in% "Museums", "Museums", "Others")))
data$corporation<-as.factor(data$corporation)
data$corp_classif<- as.factor(data$corp_classif)
# Let's summarize and graph these corporations
data_corporations<- data %>%
group_by(corp_classif, corporation) %>%
dplyr::summarise(total=n(), percentage=n()/nrow(data)) %>%
arrange(desc(total)) %>%
filter(corporation!="Other")
DT::datatable(data_corporations,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 10: ',
htmltools::em('Which American corporations are importing the most'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)data_corporations %>% mutate(percentage=percentage*100) %>%
top_n(15, percentage) %>%
plot_ly(x=~reorder(corporation, desc(percentage)), y=~percentage,
color=~corp_classif) %>%
add_bars() %>%
layout(title = "<b>Which American corporations are importing the most (Top 15)</b>",
xaxis= list(title= "<b>American corporation</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))## Graph by classification
data_corporations %>% mutate(percentage=percentage*100) %>%
filter(corp_classif == "Fashion/Luxury/Design") %>%
plot_ly(x=~reorder(corporation, desc(percentage)), y=~percentage,
marker = list(color = "coral")) %>%
add_bars() %>%
layout(title = "<b>Fashion/Luxury/Design Corporations</b>",
xaxis= list(title= "<b>American corporation</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))data_corporations %>% mutate(percentage=percentage*100) %>%
filter(corp_classif == "Animal/animal prod. providers") %>%
plot_ly(x=~reorder(corporation, desc(percentage)), y=~percentage,
marker = list(color = "green")) %>%
add_bars() %>%
layout(title = "<b>Animal/animal prod. providers Corporations</b>",
xaxis= list(title= "<b>American corporation</b>" ,tickangle=-65),
yaxis = list(title = "<b>Percentage</b>"))3.2.6 foreign_co
It represents the foreign party of the shipment
- There are 237,994 foreign parties
- The top 15 just represents 5.4% of data
- The top 50 just represents 12.4% of data
foreign_co <-data %>%
filter(!foreign_co == "EXEMPTIONS 6 AND 7(C)") %>%
group_by(foreign_co) %>%
summarise(total = n(), percentage=n()/nrow(data)) %>%
drop_na(foreign_co) %>%
arrange(desc(percentage))
DT::datatable(foreign_co,
caption = htmltools::tags$caption(
style='caption-side: bottom; text-align: center;','Table 11: ',
htmltools::em('Foreign party of the shipment'
))) %>%
formatRound('total',1) %>%
formatPercentage('percentage',2)